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Abstract.  Let  M(m,n)  be  the  minimum  number  of  pairwise  comparisons 
which  will  always  suffice  to  merge  two  linearly  ordered  lists  of  lengths 
m and  n . We  prove  that  M(m,ntf-d)  = 2m+d-l  whenever  m > 2d-2  . 

This  generalizes  earlier  results  of  Graham  and  Karp  (d  = 1)  , 

Hwang  and  Lin  (d  = 2,3)  > Knuth  (d  = 1+  ) , and  shows  that  the  standard 
linear  merging  algorithm  is  optimal  whenever  m < n < |_3m/2j+l  . 
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Introduction.. 


Suppose  we  are  given  two  linearly  ordered  sets  A and  B consisting 
of  elements 

a,  < a_  < . . . < a 
1 2 m 

and  b.  < b0  < . . . < b , 

12  n 

respectively,  where  the  m+n  elements  are  distinct.  The  problem  of  merging 
these  sets  into  a single  ordered  set  by  means  of  a sequence  of  pairwise 
conqparisons  is  of  obvious  practical  interest,  and  several  algorithms  have 
been  devised  for  handling  it. 

An  intriguing  theoretical  problem  is  to  determine  M(m,n)  , the 
minimum  number  of  comparisons  which  will  always  suffice  to  merge  the 
sets  in  a decision  tree  model  [5].  Evaluating  this  function  in  general 
seems  quite  difficult,  and  values  are  known  for  only  a few  special  cases, 
including  m <3  ([1]>  [2],  and  [U]).  In  one  direction,  an  upper  hound 

for  M(m, n)  is  provided  by  a simple  procedure  variously  referred  to  as 
the  normal,  standard,  linear,  or  tape  merge  algorithm.  Here  the  two 
smallest  elements  (initially  a^  and  ) are  compared,  and  the  smaller 
of  these  is  deleted  from  its  list  and  placed  on  an  output  list.  The 
process  is  repeated  until  one  list  is  exhausted.  It  is  easy  to  see  that 
this  algorithm  requires  nrt-n-1  comparisons  in  the  worst  case,  so  that 

M(m,n)  < n+m-1  . 

Although  better  algorithms  exist  for  many  cases,  R.  L.  Graham  and 
R.  M.  Karp  independently  observed  that  this  algorithm  is  optimal  when 
|n-m|  is  0 or  1 . That  is,  they  showed  that 
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M(m,m)  = 2m- 1 

and  M(m,ra+1)  * 2m  . 

Later  Hwang  and  Lin  [3]  proved  that 

M(m,m+2)  « 2m«-l  for  m > 2 

and  M(m,ntf3)  » 2nH-2  for  m > 1*  , 

while  Khuth  [5,  P»  204]  verified  that 

M(m,m+U)  - 2mf3  for  m > 6 . 

In  this  paper  we  generalize  these  results  by  proving  that 
M(m,m*-d)  = 2m+d-l  for  m > 2d-2  . 

Intuitively,  this  means  that  the  standard  merge  algorithm  is  optimal,  in 
the  worst-case  sense,  whenever  m < n < L3m/2j+l  . 

2.  Oracles. 

A lower  bound  for  M(m,  n)  will  be  produced  by  means  of  an  "oracle", 
the  proof  technique  utilized  for  example  by  Khuth  [5,  Section  5 -3 .2]. 

In  his  formulation,  when  presented  with  a comparison  a^  vs.  b^  , 
an  oracle  announces  which  is  larger  and  simultaneously  chooses  a strategy 
for  answering  further  questions  so  as  to  force  a large  number  of  additional 
comparisons  to  be  made.  A useful  lower  bound  is  obtained  from  an  oracle 
that  has  an  effective  strategy  for  dealing  with  any  comparison  it  might 
encounter. 

In  addition  to  an  oracle  that  provides  a lower  bound  for  M(m,n)  , 
oracles  are  needed  to  furnish  lower  bounds  for  two  other  functions.  Let 
/M(m, n)  be  the  number  of  comparisons  required  to  merge  two  lists  for 
which,  unknown  to  the  merger,  is  in  fact  greater  than  b^  . An  oracle 
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for  this  function  must  therefore  make  all  pronouncements  consistent  with 

. Similarly,  let  /M\(m,  n)  be  the  number  of  comparisons  required 
when  is  greater  than  b^  and  is  less  than  bn  , again  unknown 

to  the  merger.  Occasionally  we  shall  use  the  notation  tf\(m,n)  to 
denote  the  number  of  comparisons  required  to  merge  two  lists  when  ^ 

Is  less  than  b^  . This  is  not  another  new  function,  though,  since  by 
symmetry  we  have  M\(m,n)  =»  /M(m,  n)  . 

To  illustrate  these  definitions,  suppose  m * 2 and  n - 4 . It 
is  veil  known  that  M(2, 4)  = 5 . However,  there  is  a way  to  perform 
this  merge  in  only  4 comparisons  if  in  fact  > b^  , by  first 
comparing  a^  with  bg  . If  a^  > b^  , the  problem  reduces  to  M(2, 2)  ; 
otherwise,  comparing  with  b^  reduces  the  problem  to  M(l,3)  . 

Thus  /M(2,U)  < 4 . 

3.  An  Example. 

We  illustrate  the  use  of  Khuth’s  oracles,  and  the  strategies  available 
to  them,  by  verifying  that  M(4,7)  > 10  . Assume  that  oracles  for  achieving 
M(m,n)  and  /M(m,n)  exist  whenever  m+n  < 10  (see  t5]).  We  consider 
four  cases. 

(i)  First,  suppose  a merge  algorithm  begins  by  comparing  with  b^  . 

The  oracle  declares  that  a^  > b^  , and  requires  that  subsequent  comparisons 
merge  {*3**2' *3*  *1*}  with  {b2,by . . .,b^]  , using  an  M(4,6)  oracle.  Thus 
M(4,7)  > 1+M(4,6)  a 1+9  « 10  in  this  case. 

(ii)  If  a merge  algorithm  begins  by  comparing  a^  with  b^  , with 

j > 2 , a more  complex  strategy  is  needed.  The  oracle  declares  that  < b^  , 
end  requires  that  later  comparisons  merge  (a^  with  (b^}  and 

with  (b1,b2, . . . ,b^}  , with  the  restriction  that  all  future 
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pronouncements  are  consistent  with  a^  < < a^  . These  restrictions 

ensure  that  information  gained  in  merging  one  subproblem  is  of  no  help 
in  the  other,  even  though  b^  is  in  both.  The  situation  is  illustrated 
in  Figure  1.  The  top  row  is  A , the  bottom  B , with  smaller  elements 
to  the  left.  The  dotted  lines  represent  the  restrictions  the  oracle 
imposes  on  itself,  and  the  subproblems  are  encircled.  With  this  strategy, 
the  oracle  can  force  at  least  1+  M\(l,l)  + /ll(3>7)  = 1+1+8  e 10 
comparisons  to  be  made  in  this  case  as  well.  Thus  any  algorithm  which 
initially  uses  a_^  requires  at  least  10  comparisons. 


Figure  1.  a_<b.,j>2. 

J 

(iii)  Ah  algorithm  that  first  compares  a£  with  b^  , with  3 <3  , 
can  be  handled  in  a manner  similar  to  (ii).  The  oracle  declares  that  ^ > b 
and  requires  that  future  comparisons  merge  fa^  with  [b^b^b^bjJ  and 
(a^eya^  with  {1 b4,b5,b 6,b?}  , under  the  restrictions  ^ ^ < ag  . 

See  Figure  2.  The  number  of  comparisons  required  in  this  case  is  at  least 

1+A(1^)  + /m(3,M  - 1+3+6  » 10  . 


Figure  2.  ag  > b^  , j < 3 . 
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(iv)  If  the  first  comparison  is  a^  vs.  with  j > U , a simpler 


strategy  will  work.  The  oracle  declares  that  < b^  , and  insists 
that  later  comparisons  merge  [a^, a^}  with  [b^,bg,b^]  and  (a^,a^} 
with  {bj^b^b^b^}  as  in  Figure  3.  The  number  of  comparisons  required 
is  at  least  1 + M(2,3)  + M(3,M  = 1+U+U  = 10. 

1 

* i 

I 

Figure  3 . &2  < b ^ , j > h . 

We  have  shown  that  any  merge  algorithm  that  begins  with  a comparison 
using  either  a^  or  requires  at  least  10  comparisons  for  this 

problem.  By  symmetry,  the  same  is  true  for  a^  and  a^  . Having 
considered  1 cases,  we  conclude  that  M(^, 7)  > 10  • 

The  two  types  of  strategy  illustrated  above  endow  an  oracle  with 
sufficient  power  to  prove  our  main  result  in  the  next  section.  In  the 
"simple"  strategy,  the  oracle  answers  the  query  and  divides  the  merge 
problem  into  two  disjoint  unrestricted  problems.  In  the  "complex"  strategy, 
there  is  an  element  of  B in  both  subproblems,  which  are  handled  by 
suitably  restricted  oracles.  Oracles  for  the  functions  /M(m,n)  and 
/lf\(m,n)  use  the  same  strategies,  with  one  or  both  subproblems  inheriting 
the  restrictions  of  the  original.  A subproblem  may  have  one  list  empty 
in  degenerate  cases,  as  in  case  (i)  above.  In  all  cases,  though, 
each  subproblem  contains  fewer  elements  than  the  original  problem,  so 
that  inductive  proofs  can  be  used. 


6 


U.  The  Main  Result. 


The  proof  of  our  theorem  is  simplified  by  first  establishing  a few 
preliminary  results. 

Lemma  1, 

(i)  /M\(m,n)  < /M(m,n)  < M(m,n)  . 

(ii)  /M(nH-l,n+l)  > /M(m,n)+2  . 

(iii)  /M\(m+l,n+l)  > /M\(m,n)+2  . 

Proof.  Part  (i)  is  obvious;  any  merge  algorithm  performs  at  least  as 
well  on  more  restricted  problems.  In  part  (ii),  an  oracle  for  /M(nH*l,n+l) 
can  make  all  pronouncements  consistent  with  b^  < < b^  < a^  , and 

force  [a^a^-.^a^}  to  be  merged  with  {b^bj, . . .,bn+1]  . Then  the 
comparisons  a^  vs.  b^  and  a^  vs.  b^  can  not  be  avoided.  The 
proof  of  part  (iii)  is  similar-. 

Me  are  nov;  ready  to  prove  the  main  result.  Although  we  are  really 
interested  only  in  part  (a),  bounds  for  all  three  functions  must  be  proved 
simultaneously,  as  each  oracle  requires  the  help  of  at  least  one  other. 

Theorem  1. 

(a)  M(m,ra*d)  > 2m+d-l  for  m > 2d-2 

(b)  /M(m,n*d)  > 2mM-l  for  a > 2d-l 

(c)  /M\(m,n+d+2)  > 2m*-d  for  m > 2d-l  . 

Proof.  If  (b)  and  (c)  are  true  for  the  threshold  values  m = 2d-l  , then 
they  are  also  true  for  m > 2d-l  by  repeated  application  of  Lemma  1 (ii) 
and  (iii)-  Also,  if  (b)  is  true  for  m > 2d-l  then  Lemma  1 (i)  implies 
that  (a)  is  also  true  for  m > 2d-l  . Thus  it  is  sufficient  to  prove  the 
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theorem  for  the  threshold  values  of  m only,  that  is. 


M(2d-2, Jd-2)  > 5d-5  , 

/M(2d-l,3d-l)  > 5d-3  , 

and  /M\(2d-l,3d+l)  > 5d-2  . 

The  proof  is  by  induction  on  d . The  starting  values  for  1 < d < 3 are 
given  in  Knuth  [5>  P.  203]. 


Part  (a).  Suppose  an  algorithm  begins  by  comparing  with  b^  , 

where  i » 2k-l  and  j < 3^-2  , for  some  integer  k satisfying  1 < k < d . 
The  oracle  proclaims  that  a.  > b . and  follows  the  simple  strategy, 
yielding 

M(2d-2,3d-2)  > 1 + M(2k-2,3k-2)  + M_(?(d-k),3  (d-k) ) 


> 1 + (5k-5)  + (5(d-k)-l) 

«=  5d-5  • 


If  i = 2k-l  and  j > 3k-l  , the  oracle  announces  that  a.  < b and 

^ J 

uses  the  complex  strategy,  with  b^^g  *-n  both  subproblems.  This  leads  to 
M(2d-2,3d-2)  > 1 + M\(2k-l,3k-2)  + /M(2(d-k)-l,3  (d-k)+l) 

> 1 + /M(2k-l,3k-2)  + /M\(2(d-k)-l,3(d-k)+l) 


> 1 + (5k-U)  + (5 (d-k) -2) 

- 5d-5  . 


This  settles  the  case  where  i is  odd.  Reversing  the  order  of  the  elements 
in  A and  B maps  aVL  points  of  A with  even  subscripts  onto  those 
with  odd.  Thus  by  symmetry  we  have  handled  the  even  case  as  well. 


Part  (b ) . Suppose  the  first  comparison  of  an  algorithm  is 

vs.  Ik  with  i a 2k-l  and.  j < 3k-2  , where  1 < k < d . The 

oracle  proclaims  that  a.  >b.  and  uses  the  complex  strategy,  with 

* J 

b^  ^ in  both  subproblems.  In  this  case  we  have 

/M(2d-l,3d-l)  > 1 + /M\(2k-2,3k-l)  + /H(2(d-k)+l,3  (d-k)+l) 

> 1 + (5k-5)  + (5 (d-k)+l) 

- 5d-3  . 

If  i * 2k-l  and  j > 3k-l  , the  oracle  announces  that  a.  < b . . The 

* 0 

simple  strategy  yields 

/M(2d-l,3d-l)  > 1 + /M(2k~l,3k-2)  + M(2(d-k),3(d-k)+l) 

> 1 + (5k-U)  + 5 (d-k) 

= 5d-3  . 

Now  suppose  i = 2k  and  j < 3k  , with  1 < k < d . Choosing 
> b^  , the  oracle  follows  the  complex  strategy,  leading  to 

/M(2d-l,3d-l)  >1+  /M\(2k-l,3k+l)  + /M(2(d-k),3  (d-k)-l) 

> 1 + (5k-2)  + (5 (d-k)-2) 

«=  5d-3  . 

Otherwise,  if  i = 2k  and  j > 3k*- 1 , the  simple  strategy  with  a.  < b 

X J 

produces 

/M(2d-l,3d-l)  > 1 + /M(2k,3k)  + M(2(d-k)-l,3 (d-k)-l) 

> 1 + (5k-l)  + (5 (d-k) -3) 

- 3d-3  • 
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the  simple  strategy,  yielding 

/M\(2d-l,3dKL)  > 1 + /M(2k-2,3k-l)  + M\(2(d-k)+l,3(d-k)+2) 

>1+  /M\(2k-2,3k-l)  + /M(2(d-k)+l,3(d-k)+2) 

> 1 + (5k-5)  + (5 (d-k)+2) 

= 5d-2  . * 

The  case  i = 2k-l  and  j > 3k  is  the  mirror  image  of  this  case. 

If  i *=  2k  and  j < 3k+l  , -with  1 < k < d , the  simple  strategy- 
works  again.  The  oracle  declares  a.  > b . , and  we  have 

/M\(2d-l,3d+l)  > 1 + /M(2k-l,3k+l)  + M\(2(d-k),3(d-k)) 

> 1 + /f-r\(2k-l,3k+l)  + /M(2(d-k),3(d-k)) 

> 1 + (5k-2)  + (5 (d-k)-l) 

= 5d-2  . 

Finally,  the  case  i = 2k  and  j > 3k-KL  is  contained  in  the  mirror  image 
of  this  case. 

In  conclusion,  we  note  that  Knuth  [5,  P-  20 6]  has  made  several 
conjectures  concerning  the  behavior  of  M(m, n)  , such  as 
M(m+-l,n+l)  > M(m,n)+2  . 

In  view  of  Theorem  1,  it  seems  reasonable  to  add 
M(n+2,n+3)  > M(m,n)+5 
to  the  list. 

Also,  it  would  be  interesting  to  know  the  precise  range  of  m and  n 
for  which  the  linear  merge  algorithm  is  optimal-  No  instances  have  been 
found  outside  the  range  m < n < L3m/2J+1  , hut  cases  as  small  as  m = 7 , 


. 

n a 12  remain  open. 
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